Visualizing in higher dimension space can be messy and unintuitive (Hilbert space, \(\mathbb{R}^p,~~p>3\), where p are numeric variables). Analysis of higher dimensions must be interpretable in terms of the original dimensions and ideally utilizes all of the information in the data.
To these ends we advise the use of projection pursuit as acheived in the R package tourr(2011, H Wickham & D Cook). Furthere, we impliment a method for manual controls following D. Cook, & A. Buja (1997) in an R package spinifex, currently available with devtools::install_github("nspyrison/spinifex"). We also compare and contrast alternative methodolgy; namely Principal Component Analysis (PCA, 1901 K. Person), T-distributed Stochastic Neighbor Embedding (t-SNE, 2008 L van derMaaten & G Hinton), and holes ompimized tour (an application of projection pursuit, 1974 J Friedman & J Tukey). Grand Tour purposed D Asimov (1985).
The R package, tourr (2011, H Wickham & D Cook), gives a means to animate 2-d projections of rotated p-dimensional data object. The path of rotation may take the form of a random walk, predefined path, or optimizing an index by (“semi-”stochastic) gradient descent (Projection Pursuit, described above).
\(Work~in~progress,~~TODO:~add~to,~cleanup\)
H. Wickham, D. Cook, H. Hofmann, and A. Buja (2011). tourr: An r package for exploring multivariate data with projections. Journal of Statistical Software 40(2), http://www.jstatsoft.org/v40.
D. Asimov (1985). The grand tour: a tool for viewing multidimensional data. SIAM Journal on Scientific and Statistical Computing, 6(1), 128–143.
D. Cook, & A. Buja (1997). Manual Controls for High-Dimensional Data Projections. Journal of Computational and Graphical Statistics, 6(4), 464–480. https://doi.org/10.2307/1390747
H. Wickham, D. Cook, and H. Hofmann (2015). Visualising statistical models: Removing the blindfold (withdiscussion). Statistical Analysis and Data Mining 8(4), 203–225.
Thanks
Prof. Dianne Cook - Guidance, inspiration, and contributions to projection pursuit
Dr. Ursula Laa - Collaboration, use cases, and development feedback
Other reading
\(TODO:~scale~output~of~spinifex::proj_data(),~case~handling~for~spinifex::slideshow(),~apply~Phys~data.\)
| Method | Interoperable | Lossy | Global | Overfitable | NonLinearData |
|---|---|---|---|---|---|
| PCA | TRUE | TRUE | TRUE | FALSE | FALSE |
| t-SNE | FALSE | NA | FALSE | TRUE | TRUE |
| Tour, holes | TRUE | FALSE | TRUE | FALSE | FALSE |
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
f.pca <- stats::prcomp(flea)
ggplot2::ggplot(f.pca) + ...
f.tsne <- Rtsne(f, ...)
f.tsne.pca <- stats::prcomp(f.tsne)
ggplot2::ggplot(f.tsne.pca) + ...
f.holes_end <- tourr::animate_xy(flea, guided_tour(index = holes))
ggplot2::ggplot(f.holes_end) + ...Data set - flea Consists of 74 observations of 6 length measurements taken across 3 different species of flea-beetles. Within the graphics species is used to select color and point character, but the methods are discused are all unsupervized (they can’t use species). Data from A Lubischew (1962), Analogous to R Fisher’s Iris data [100x5] (1936). The flea dataset is available in the tourr and spinifex R packages.